class U_I18N_API BreakIterator

The BreakIterator class implements methods for finding the location of boundaries in text

Public Fields

static const UTextOffset DONE
DONE is returned by previous() and next() after all valid boundaries have been returned

Public Methods

virtual bool_t operator==(const BreakIterator&) const
Return true if another object is semantically equal to this one
virtual BreakIterator* clone(void) const
Return a polymorphic copy of this object
virtual UClassID getDynamicClassID(void) const
Return a polymorphic class ID for this object
virtual CharacterIterator* createText(void) const
Get the text for which this object is finding the boundaries
virtual void setText(const UnicodeString* it)
Change the text over which this operates
virtual void adoptText(CharacterIterator* it)
Change the text over which this operates
virtual UTextOffset first(void)
Return the index of the first character in the text being scanned
virtual UTextOffset last(void)
Return the index immediately BEYOND the last character in the text being scanned
virtual UTextOffset previous(void)
Return the boundary preceding the current boundary
virtual UTextOffset next(void)
Return the boundary following the current boundary
virtual UTextOffset current(void) const
Return character index of the text boundary that was most recently returned by next(), previous(), first(), or last()
virtual UTextOffset following(UTextOffset offset)
Return the first boundary following the specified offset
virtual UTextOffset preceding(UTextOffset offset)
Return the first boundary preceding the specified offset
virtual bool_t isBoundary(UTextOffset offset)
Return true if the specfied position is a boundary position
virtual UTextOffset next(int32_t n)
Return the nth boundary from the current boundary
static BreakIterator* createWordInstance(const Locale& where = Locale::getDefault())
Create BreakIterator for word-breaks using the given locale
static BreakIterator* createLineInstance(const Locale& where = Locale::getDefault())
Create BreakIterator for line-breaks using specified locale
static BreakIterator* createCharacterInstance(const Locale& where = Locale::getDefault())
Create BreakIterator for character-breaks using specified locale Returns an instance of a BreakIterator implementing character breaks
static BreakIterator* createSentenceInstance(const Locale& where = Locale::getDefault())
Create BreakIterator for sentence-breaks using specified locale Returns an instance of a BreakIterator implementing sentence breaks
static const Locale* getAvailableLocales(int32_t& count)
Get the set of Locales for which TextBoundaries are installed
static UnicodeString& getDisplayName(const Locale& objectLocale, const Locale& displayLocale, UnicodeString& name)
Get name of the object for the desired Locale, in the desired langauge
static UnicodeString& getDisplayName(const Locale& objectLocale, UnicodeString& name)
Get name of the object for the desired Locale, in the langauge of the default locale

Documentation

The BreakIterator class implements methods for finding the location of boundaries in text. BreakIterator is an abstract base class. Instances of BreakIterator maintain a current position and scan over text returning the index of characters where boundaries occur.

Line boundary analysis determines where a text string can be broken when line-wrapping. The mechanism correctly handles punctuation and hyphenated words.

Sentence boundary analysis allows selection with correct interpretation of periods within numbers and abbreviations, and trailing punctuation marks such as quotation marks and parentheses.

Word boundary analysis is used by search and replace functions, as well as within text editing applications that allow the user to select words with a double click. Word selection provides correct interpretation of punctuation marks within and following words. Characters that are not part of a word, such as symbols or punctuation marks, have word-breaks on both sides.

Character boundary analysis allows users to interact with characters as they expect to, for example, when moving the cursor through a text string. Character boundary analysis provides correct navigation of through character strings, regardless of how the character is stored. For example, an accented character might be stored as a base character and a diacritical mark. What users consider to be a character can differ between languages.

This is the interface for all text boundaries.

Examples:

Helper function to output text

.   void printTextRange( BreakIterator& iterator, UTextOffset start, UTextOffset end )
.   {
.       UnicodeString textBuffer, temp;
.       CharacterIterator *strIter = iterator.createText();
.       strIter->getText(temp);
.       cout << " " << start << " " << end << " |" 
.            << temp.extractBetween(start, end, textBuffer)
.            << "|" << endl;
.       delete strIter;
.   }
Print each element in order:
.   void printEachForward( BreakIterator& boundary)
.   {
.      UTextOffset start = boundary.first();
.      for (UTextOffset end = boundary.next();
.        end != BreakIterator::DONE;
.        start = end, end = boundary.next())
.        {
.            printTextRange( boundary, start, end );
.        }
.   }
Print each element in reverse order:
.   void printEachBackward( BreakIterator& boundary)
.   {
.      UTextOffset end = boundary.last();
.      for (UTextOffset start = boundary.previous();
.        start != BreakIterator::DONE;
.        end = start, start = boundary.previous())
.        {
.            printTextRange( boundary, start, end );
.        }
.   }
Print first element
.   void printFirst(BreakIterator& boundary)
.   {
.       UTextOffset start = boundary.first();
.       UTextOffset end = boundary.next();
.       printTextRange( boundary, start, end );
.   }
Print last element
.   void printLast(BreakIterator& boundary)
.   {
.       UTextOffset end = boundary.last();
.       UTextOffset start = boundary.previous();
.       printTextRange( boundary, start, end );
.   }
Print the element at a specified position
.   void printAt(BreakIterator &boundary, UTextOffset pos )
.   {
.       UTextOffset end = boundary.following(pos);
.       UTextOffset start = boundary.previous();
.       printTextRange( boundary, start, end );
.   }
Creating and using text boundaries
.      void BreakIterator_Example( void )
.      {
.          BreakIterator* boundary;
.          UnicodeString stringToExamine("Aaa bbb ccc. Ddd eee fff.");
.          cout << "Examining: " << stringToExamine << endl;
.
.          //print each sentence in forward and reverse order
.          boundary = BreakIterator::createSentenceInstance( Locale::US );
.          boundary->setText(&stringToExamine);
.          cout << "----- forward: -----------" << endl;
.          printEachForward(*boundary);
.          cout << "----- backward: ----------" << endl;
.          printEachBackward(*boundary);
.          delete boundary;
.
.          //print each word in order
.          boundary = BreakIterator::createWordInstance();
.          boundary->setText(&stringToExamine);
.          cout << "----- forward: -----------" << endl;
.          printEachForward(*boundary);
.          //print first element
.          cout << "----- first: -------------" << endl;
.          printFirst(*boundary);
.          //print last element
.          cout << "----- last: --------------" << endl;
.          printLast(*boundary);
.          //print word at charpos 10
.          cout << "----- at pos 10: ---------" << endl;
.          printAt(*boundary, 10 );
.
.          delete boundary;
.      }
virtual bool_t operator==(const BreakIterator&) const
Return true if another object is semantically equal to this one. The other object should be an instance of the same subclass of BreakIterator. Objects of different subclasses are considered unequal.

Return true if this BreakIterator is at the same position in the same text, and is the same class and type (word, line, etc.) of BreakIterator, as the argument. Text is considered the same if it contains the same characters, it need not be the same object, and styles are not considered.

virtual BreakIterator* clone(void) const
Return a polymorphic copy of this object. This is an abstract method which subclasses implement.

virtual UClassID getDynamicClassID(void) const
Return a polymorphic class ID for this object. Different subclasses will return distinct unequal values.

virtual CharacterIterator* createText(void) const
Get the text for which this object is finding the boundaries

virtual void setText(const UnicodeString* it)
Change the text over which this operates. The text boundary is reset to the start.

virtual void adoptText(CharacterIterator* it)
Change the text over which this operates. The text boundary is reset to the start.

static const UTextOffset DONE
DONE is returned by previous() and next() after all valid boundaries have been returned

virtual UTextOffset first(void)
Return the index of the first character in the text being scanned

virtual UTextOffset last(void)
Return the index immediately BEYOND the last character in the text being scanned

virtual UTextOffset previous(void)
Return the boundary preceding the current boundary
Returns:
The character index of the previous text boundary or DONE if all boundaries have been returned.

virtual UTextOffset next(void)
Return the boundary following the current boundary
Returns:
The character index of the next text boundary or DONE if all boundaries have been returned.

virtual UTextOffset current(void) const
Return character index of the text boundary that was most recently returned by next(), previous(), first(), or last()
Returns:
The boundary most recently returned.

virtual UTextOffset following(UTextOffset offset)
Return the first boundary following the specified offset. The value returned is always greater than the offset or the value BreakIterator.DONE
Returns:
The first boundary after the specified offset.
Parameters:
offset - the offset to begin scanning.

virtual UTextOffset preceding(UTextOffset offset)
Return the first boundary preceding the specified offset. The value returned is always smaller than the offset or the value BreakIterator.DONE
Returns:
The first boundary before the specified offset.
Parameters:
offset - the offset to begin scanning.

virtual bool_t isBoundary(UTextOffset offset)
Return true if the specfied position is a boundary position
Returns:
True if "offset" is a boundary position.
Parameters:
offset - the offset to check.

virtual UTextOffset next(int32_t n)
Return the nth boundary from the current boundary
Returns:
The index of the nth boundary from the current position, or DONE if there are fewer than |n| boundaries in the specfied direction.
Parameters:
n - which boundary to return. A value of 0 does nothing. Negative values move to previous boundaries and positive values move to later boundaries.

static BreakIterator* createWordInstance(const Locale& where = Locale::getDefault())
Create BreakIterator for word-breaks using the given locale. Returns an instance of a BreakIterator implementing word breaks. WordBreak is useful for word selection (ex. double click)
Returns:
A BreakIterator for word-breaks
Parameters:
where - the locale. If a specific WordBreak is not avaliable for the specified locale, a default WordBreak is returned.

static BreakIterator* createLineInstance(const Locale& where = Locale::getDefault())
Create BreakIterator for line-breaks using specified locale. Returns an instance of a BreakIterator implementing line breaks. Line breaks are logically possible line breaks, actual line breaks are usually determined based on display width. LineBreak is useful for word wrapping text.
Returns:
A BreakIterator for line-breaks
Parameters:
where - the locale. If a specific LineBreak is not avaliable for the specified locale, a default LineBreak is returned.

static BreakIterator* createCharacterInstance(const Locale& where = Locale::getDefault())
Create BreakIterator for character-breaks using specified locale Returns an instance of a BreakIterator implementing character breaks. Character breaks are boundaries of combining character sequences.
Returns:
A BreakIterator for character-breaks
Parameters:
where - the locale. If a specific character break is not avaliable for the specified locale, a default character break is returned.

static BreakIterator* createSentenceInstance(const Locale& where = Locale::getDefault())
Create BreakIterator for sentence-breaks using specified locale Returns an instance of a BreakIterator implementing sentence breaks
Returns:
A BreakIterator for sentence-breaks
Parameters:
where - the locale. If a specific SentenceBreak is not avaliable for the specified locale, a default SentenceBreak is returned.

static const Locale* getAvailableLocales(int32_t& count)
Get the set of Locales for which TextBoundaries are installed
Returns:
available locales
Parameters:
count - the output parameter of number of elements in the locale list

static UnicodeString& getDisplayName(const Locale& objectLocale, const Locale& displayLocale, UnicodeString& name)
Get name of the object for the desired Locale, in the desired langauge
Returns:
user-displayable name
Parameters:
objectLocale - must be from getAvailableLocales.
displayLocale - specifies the desired locale for output.
name - the fill-in parameter of the return value Uses best match.

static UnicodeString& getDisplayName(const Locale& objectLocale, UnicodeString& name)
Get name of the object for the desired Locale, in the langauge of the default locale
Returns:
user-displayable name
Parameters:
objectLocale - must be from getMatchingLocales
name - the fill-in parameter of the return value


This class has no child classes.

alphabetic index hierarchy of classes


this page has been generated automatically by doc++

(c)opyright by Malte Zöckler, Roland Wunderling
contact: doc++@zib.de